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ABSTRACT 

Between 6 and 30% of human and mouse transcripts 
are initiated from transposable elements. However, 
the promoters driving such transcriptional activity 
are mostly unknown. We experimentally characterized 
an antisense (AS) promoter in mouse L1 retrotrans- 
posons for the first time, oriented antiparallel to the 
coding strand of L1 open reading frame-1. We found 
that AS transcription is mediated by RNA polymerase 
II. Rapid amplification of cDNA ends cloning mapped 
transcription start sites adjacent to the AS promoter. 
We identified >100 novel fusion transcripts, of which 
many were conserved across divergent mouse 
lineages, suggesting conservation of potential func- 
tions. To evaluate whether AS L1 transcription could 
regulate L1 retrotransposition, we replaced portions 
of native open reading frame-1 in donor elements 
by synonymously receded sequences. The resulting 
LI elements lacked AS promoter activity and 
retrotransposed more frequently than endogenous 
Lis. Overexpression of AS LI transcripts also 
reduced LI retrotransposition. This suppression of 
retrotransposition was largely independent of Dicer. 
Our experiments shed new light on how AS fusion 



transcripts are initiated from endogenous LI 
elements across the mouse genome. Such AS tran- 
scription can contribute substantially both to natural 
transcriptional variation and to endogenous regula- 
tion of LI retrotransposition. 

INTRODUCTION 

Long interspersed elements (LINEs, Lis) are a major class 
of mammalian retrotransposons, comprising ~19 and 21% 
of the mouse and human genomes, respectively (1,2). 
Approximately half of the mammalian genome has 
resulted from LI -mediated mobilization (3). Ongoing, 
endogenous LI retrotransposition has caused widespread 
genomic structural variation between mouse strains (4,5) 
and between human individuals (6,7), and also causes 
somatic variation both in normal development and in 
certain human cancers (8). Full-length Lis (~6.0kilobases, 
in human, ~7.0kb in mouse) contain an internal sense- 
stranded promoter in the 5^ untranslated region (UTR), 
two open reading frames (ORFl and ORF2) and a 3^ 
UTR with a poly(A) tail (3). ORFl encodes a nucleic 
acid-binding chaperone protein (9,10), whereas ORF2 
encodes an endonuclease (11), reverse transcriptase and a 
zinc finger-like protein (12). Both ORFs are required for au- 
tonomous retrotransposition (13). Thousands of full-length 
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elements in three young LI subfamilies (Tp, Gp and A) 
reside in the mouse genome (4,14,15). The mouse LI 
subfamilies are defined by differences in their 5^ UTR 
monomeric repeats. ORF2 contains the fewest nucleotide 
variants, whereas the y UTR has the most (15). Members 
of each subfamily have integrated into the mouse genome 
after the evolutionary split between rat and mouse. Many 
LI Tp, Gp and A integrants are polymorphic, reflecting 
recent ongoing retrotransposition (4,5,15). 

The myriad potential biological impacts of endogenous 
transposable elements (TEs) in human and mouse appear 
to depend on their genomic context, their sequence struc- 
ture and other factors (16). Endogenous TEs have been 
shown to affect neighboring gene expression in various 
ways. For example, they have been reported to initiate a 
surprising number, between 6 and 30%, of human and 
mouse transcripts (17). In humans, an active antisense 
(AS) LI promoter in the 5' UTR of full-length Lis initi- 
ates expression of numerous distinct AS LI retrotrans- 
poson-initiated fusion transcripts (RIFTs), thereby 
contributing to and modifying the expression of numerous 
neighboring genes (18-22). As a majority of full-length 
intragenic human Lis are oriented AS to flanking genes' 
ORFs (23,24), resulting AS LI RIFTs frequently include 
downstream spliced exons expressed in the canonical sense 
orientation. Other human AS LI RIFTs are noncoding 
(25-27). Mouse endogenous retroviruses have been 
shown to disrupt overlapping gene expression (28,29). 
Human Lis may affect expression of overlapping genes, 
including the Met proto-oncogene (30) and others (31). 

Like other mammahan TEs, Lis are constrained by 
various cellular defenses including DNA methylation, 
histone modifications. Dicer-mediated RNA interference 
(RNAi) and other small RNA-mediated effects (32-37). 
Bidirectional promoters within the human LI 5^ UTR, 
i.e. the internal sense and AS promoters (20), ^500 nu- 
cleotides apart, can initiate double-stranded transcripts 
that can be processed to small interfering RNAs 
(siRNAs) by Dicer (34,38). Single-stranded transcripts 
also can be processed to small RNAs, regardless of 
whether they are initiated within or outside of LI 
elements. Resulting LI -specific small RNAs could 
mediate transcriptional and/or posttranscriptional gene 
silencing (34,36,39-42). Both sense and AS transcripts 
mapping to the 5^ end of full-length mouse LI elements 
are expressed in mouse embryonic stem (ES) cells (43,44). 
Mouse chimeric transcripts containing AS LI sequences 
also have been identified (4,45). Together, these results 
suggest that mouse LI elements also may contain one or 
more AS promoters. However, despite identification of AS 
LI RIFTs in mouse testis and of sense and AS transcripts 
in mouse ES cells, a putative AS promoter has not been 
experimentally vaHdated up to now. Moreover, both its 
activity in other tissues and its possible biological roles 
have not been described. Here, we identified an active 
mouse AS LI promoter within ORFl, immediately 
proximal to AS LI RIFTs' transcription start sites 
(TSS). We found that the resulting AS mouse LI RIFTs, 
including spliced, unspliced and many noncoding RNAs, 
were initiated by interspersed Lis genome-wide. Our 
results indicate that AS LI RIFTs contribute to the 



diverse transcriptome (including long noncoding RNAs) 
expressed in various tissues (25,26,46). AS transcription 
also helps to limit mouse LI retrotransposition through 
a Dicer-independent mechanism (42). 



MATERIALS AND METHODS 

Mouse colony, cell lines and isolation of genomic 
DNA and RNA 

Mice were maintained and euthanized according to 
approved Institutional Animal Care and Use Committee 
protocols (National Cancer Institute, Frederick, MD, 
USA; Ohio State University, Columbus, OH, USA). 
Mouse strains and purified genomic DNA were purchased 
from the Jackson Laboratory (Bar Harbor, ME, USA). 
A mouse spermatocyte cell line (CRL2196) was purchased 
from the American Type Culture Collection. HeLa cells 
were provided by Dr John V. Moran (University of 
Michigan). HCT116 Dicer ex5 knockout cells were 
provided by Dr Bert Vogelstein (Johns Hopkins). 

Genomic DNA and pooled total RNAs were isolated 
from CRL2196 cells and from various tissues, ages and 
lineages of mice as indicated, using standard methods and 
Trizol (Invitrogen), respectively. 

Oligonucleotide sequences 

Primer sequences and annotations are listed in 
Supplementary Table SI. 

Candidate promoter activity assays 

Genomic DNA fragments representing four mouse LI 
subfamilies (14) (Tp GenBank accession number AFO 16099; 
Gp, AC068252; A, AY053456 and Fill, AC002406) and a 
synthetic LI element smLl (47) were amplified by PCR 
using Platinum Taq HiFi (Invitrogen) and forward and 
reverse primers incorporating Bglll and Ncol restriction 
sites, respectively (Supplementary Table SI). Amplicons 
included fragments of LI Tp (represented by LI spa in 
pTN201), sense promoter (primers DBS 121 2 and 
DES1213), AS promoter (DES1218 x DES1220, 
DES1218xDES1221) and AS ORF2 (DES1459x 
DES1460). Promoter candidates were cloned directionally 
upstream of TEMl, a P-lactamase reporter gene, in 
pBLAK-b, which lacks a promoter (Invitrogen) (Figure 1 
and Supplementary Figure SI). They were confirmed by 
sequencing. One microgram of ^^/Il-digested (linearized) 
plasmid DNA was transiently transfected into CRL2196 or 
HeLa cells using FuGENE 6 (Roche). As positive and 
negative controls, plasmids with and without the SV40 
promoter upstream of TEMl were used (pBLAK-c and 
pBLAK-b, respectively). 

To quantify (3-lactamase protein expression, cells were 
stained with CCF2/AM substrate (Invitrogen) (48,49) by 
replacing culture medium with 1 ml loading solution [2 |il 
of a 1-mM CCF2/AM solution, 16|il of solution B, 10|il 
of 250 mM Probenicid (Sigma) and 972 |il Hanks' 
Balanced Salt solution, (HBSS)] per 9.6 cm^ well. Cells 
were incubated in the dark at room temperature (RT) 
for 1 h with gentle shaking, washed with HBSS and 
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Figure 1. Mapping an active AS promoter within LI ORFI. (A) Schematic representation of an LI Tp subfamily retrotransposon, Llspa, with 
coordinates indicated as used throughout this article. Llspa was identified in GenBank accession no. AF016099. Below: Probes for phage cDNA 
library hybridization against ORF2 (2858-3269 nt) and ORFI (1814-2101 nt). (B) Various DNA fragments were directionally engineered upstream of 
a promoterless reporter gene, i.e. P-lactamase TEMl. (C) Linearized DNAs containing various candidate promoter-reporter cassettes were trans- 
fected into HeLa cells. Functional beta-lactamase protein expression was measured by staining cells with CCF2-AM, whose fluorescence emission 
shifts from green to blue on increased enzymatic cleavage (48). Cells expressing (left) or not expressing (right) P-lactamase were evaluated both by 
flow cytometry (top), which measured quantitative blue/green emission ratios (49), and by fluorescence microscopy (bottom). (D) Fragments 
derived from various LI positions and subclasses were numbered and directionally oriented as indicated (Supplementary Figure SI and 
Supplementary Table SI). Their promoter strengths were assayed as described in part B. Key: colors and thicknesses indicate promoter activity 
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visualized using an Axiovert 200 M inverted fluorescence 
microscope (Zeiss) with blue/aqua and P-lactamase ratio 
filter sets (Chroma Technology Corp.) and ORCA-ER 
high resolution digital camera (Hamamatsu Photonics) 
using Openlab software (version 4.0.2, Improvision). 
Flow cytometric analysis was performed using a BD 
LSR II flow cytometer with a 405 nm violet laser, 440/ 
40 nm (blue) and 530/30 nm (green) filters, and 
FACSDiva software (BD Biosciences). Ratios of blue to 
green intensities were collected as a Hnear parameter. Each 
flow cytometry session included positive and negative 
controls to normalize output. 

TEMl expression also was quantified by quantitative 
reverse transcriptase-mediated PCR (qRT-PCR). 
Promoter candidates were linearized by Bglil digestion 
and transfected into HeLa cells using FuGENE 6. Total 
RNAs were isolated ~48h after transfection using 
RNeasy kit (Qiagen). Standard curves were based on 
serial dilutions of control plasmids. First strand cDNAs 
were synthesized using oHgo-d(T) (DES2633) primer and 
the Superscript double-stranded cDNA synthesis kit 
(Invitrogen). As further controls, RNAs were treated 
with and without reverse transcriptase. qRT-PCR was 
performed on an iCycler (Bio-Rad) or Step One Plus 
(Applied Biosystems) instrument, using SYBR Green 
Supermix (Bio-Rad). TEMl transcript concentrations 
were calculated by interpolation, after subtracting for 
input plasmid DNA contamination. Beta-actin transcript 
levels were calculated for each sample. Each sample was 
measured in triplicate. Results are presented for each 
sample as the normalized ratio of TEMl to beta-actin 
transcript levels. 

Chromatin immunoprecipitation of RNA polymerases 

Anti-mouse RNA polymerase III subunit RPC39 mouse 
monoclonal antibody was purchased from Santa Cruz 
(catalog no. SC-21753). Anti-mouse RNA polymerase II 
mouse monoclonal antibody (Cat. 39097) was from Active 
Motif. For chromatin immunoprecipitation (ChIP), the 
Magna ChIP G Tissue kit (Millipore) was used following 
the manufacturer's instructions. 

Identification of TSS of TE-initiated fusion transcripts 

We performed 5^ RACE cloning using a second- 
generation 573' RACE kit (Roche) with an AS LI 
ORFl -specific primer (DES1947; cf. Supplementary 
Table SI) for first strand cDNA synthesis. 

Phage library screens for mouse transcripts containing 
LI sequences 

Double-stranded DNA probes for mouse LI ORF2 and 
ORFl transcripts were amplified by PCR from LI spa 



(AFO 16099), a representative full-length Tp template 
(50). Bacteriophage cDNA libraries from mouse testis 
(Clontech) and thymus (Stratagene) were hybridized 
with an ORF2 probe followed by an ORFl probe. 

The primer pairs used to amplify fragments from LI spa 
ORF2 and ORFl were DES1165 x DES1166 and DES 
1167 X DESl 168, respectively (Supplementary Table SI). 
Resulting PCR products were gel purified and 
radiolabeled by random nonamer priming. Commercial 
bacteriophage cDNA libraries from mouse testis 
(Clontech) and thymus (Stratagene) were plated at 
~50 000 plaques per dish, transferred to Hybond-N 
filters (Amersham) and hybridized with the ORF2 
probe. Filters were washed at 65°C in O.lxSSC 
and 0.1% SDS and autoradiographed. This procedure 
was repeated with the ORFl probe to identify 
0RF1^0RF2~ clones, which were purified upon add- 
itional rounds of hybridization. Phage plaques were con- 
verted to plasmids and sequenced using BigDye v. 3.1 
(AppHed Biosystems) on a 96 capillary sequencer 
(Transgenomic Spectrumedix) with primers DES886 and 
DES837 (5^ and y ends, testis cDNA) and standard M13R 
and M13F oHgonucleotides (5^ and y ends, thymus). 
Additional sequences for full-length cDNA sequencing 
by primer walking are available on request. 

RT-PCR amplification of AS LI RIFTs 

To target AS LI RIFTs, first strand cDNAs were 
synthesized (Roche) from DNase-treated total RNAs, 
using the M13F-anchored primer DESl 141 paired with 
DES1256 for mouse Llspa ORFl, AS nucleotides 2011- 
1991. PCR products (1-3 kb) were isolated using gel puri- 
fication columns (Qiagen), and cloned for sequencing. 
Tissue-specific expression patterns of AS LI RIFTs and 
corresponding cognate genes were analyzed by RT-PCR 
using a commercial multiple-tissue mouse cDNA panel 
(Clontech). 

Computational identification of AS LI RIFTs 

A BLASTN search of mouse EST databases from testis 
and other tissues was conducted using AS Llspa (Tp sub- 
family) ORFl as query, i.e. AS nucleotides 2225-1838 (cf. 
coordinates. Figure 1). 

Identification of RIFTs using exon microarrays 

To develop a novel assay to identify LI RIFTs, we 
modified the manufacturer's protocol for the Affymetrix 
GeneChip mouse exon microarray. First strand cDNA 
synthesis was performed on total RNA isolated from vari- 
ous tissues and lineages, using a primer including both T7 
promoter and oHgo-d(T) sequences and Superscript 
II reverse transcriptase (Invitrogen). Polyadenylated 



Figure 1. Continued 

scores for each fragment assayed. The highest scores (>50, red, thick hne) indicate strongest promoter activities. (E) TEMl transcript levels were 
measured using qRT-PCR (arrows: primer binding sites) to assess the candidate fragments' promoter activities. (F) The ratio of TEMl to beta-actin 
transcript concentrations was calculated {y-axis) after correction for amplification of contaminating plasmid or genomic DNA. As a positive control, 
SV40 early promoter was engineered upstream of the TEMl reporter, and as negative controls, no promoter was included or no plasmid was 
transfected. The AS LI promoter activity (fragment 6) is half that of the sense-stranded mouse LI 5' UTR promoter (fragment 1). Fragments are 
numbered as in (D). 
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cDNAs containing AS LI sequences were amplified using 
a primer for a particular LI ORFl AS template sequence 
paired with the T7 promoter primer. Resulting double- 
stranded RIFT cDNAs containing T7 sequences were 
used as templates for in vitro transcription, following 
standard procedures (Affymetrix). Resulting AS RNA 
was purified; a second round of first strand cDNA synthe- 
sis was performed with reverse transcriptase, dUTP and 
random primers; cRNA was hydrolyzed using RNaseH, 
and resulting sense strand DNA was purified. Products 
were fragmented with uracil DNA glycosylase and 
apurinic/apyrimidinic (AP)-endonuclease L Terminal 
labeling was performed with terminal deoxynucleotidyl 
transferase (TdT), and resulting labeled fragments were 
hybridized to the exon microarray. 

Resulting raw data from an Affymetrix microarray 
chip reader were analyzed for transcript expression, 
using Partek Genomics Suite software. CEL files 
(MoEx-l_0-st-vl) were imported using RMA background 
correction and quantile normalization. Probe intensities 
were transformed to log base 2. We defined signals with 
intensity > mean + one standard deviation (i.e. Mog2 in- 
tensity >7) as high expression probes and counted the 
number of consecutive high expression probes per 
annotated gene. On alignment with the reference mouse 
genome, candidate fusion transcripts were scored positive 
if a neighboring AS LI could be identified within 30 kb of 
an overlapping RefSeq gene and/or within lOOkb of the 
upregulated probe(s). We also required five consecutive 
high expression probe intensities (corresponding to 
adjacent exons in a given gene) in exon microarray data; 
length of predicted initiating genomic LI integrant had to 
be >5kb; and its subtype had to be LI Tp, A, Gp or F as 
per RepeatMasker (www.repeatmasker.org). 

Modified Ll-reporter plasmids 

To compare retrotransposition frequencies of various LI 
donors, we started by replacing native LI ORFl se- 
quences with a synonymous fragment of smLl (47), by 
moving a Not\-Xho\ fragment from pTN201 (LI spa) 
(50), into pBluescript-KS(+), yielding pMK20. A Pstl- 
Hindlll fragment of pMK20 was moved into pBS- 
KS(+), yielding pMK21. Using a QuikChange site- 
directed mutagenesis kit (Stratagene), we introduced 
a Pad restriction site into the inter-ORF region in 
pMK21, yielding pMK22. Its 0.2-kb Not\-Pac\ fragment 
was replaced with the 2.9-kb Notl-Pacl fragment from 
pCEP/smLl (i.e. the synthetic 5^UTR and ORFl) (47), 
resulting in the 8.5-kb plasmid pMK22smORFl. We 
then subcloned its 5.6-kb Notl-Hindlll fragment back 
into the Notl-Hindlll backbone (5.4 kb) from pMK20, 
resulting in ~ll-kb pMK27, i.e. a marked, full-length 
LI donor element. This plasmid was Sanger sequenced 
(Big Dye 3.1, Applied Biosystems; Transgenomic 
Spectrumedix), revealing a missense mutation in ORF2, 
i.e. Ala756Ser, along with two noncoding mutations 
present in pTN201. The missense mutation was repaired 
by replacement of a ~1.4-kb EcoRI fragment in pMK27 
with the corresponding fragment from pTN201. The 
8.1-kb Notl-Xhol fragment of repaired pMK27 was 



Hgated with a -11.7-kb Notl-Xhol pTN201 backbone 
fragment, yielding the desired final plasmid, i.e. pMK28 
or Llspa::smLl-ORFl. 

To preserve A/T content and synonymous amino acids 
of native Lis, while maximally changing codon usage, 
we also designed a novel recoded LI ORFl fragment, cor- 
responding to 2123-2932 nt from LI spa (Supplementary 
Figure S2). This fragment (Blue Heron Bio), which 
also included 50-nt flanking arms on both ends 
for recombineering, was cloned into pUC MinusMCS, re- 
sulting in plasmid pJL2. The recoded LI ORFl fragment 
from pJL2 was amplified by PGR using 
DES3353 X DES3354 and Platinum Taq DNA polymer- 
ase High Fidelity (Invitrogen), gel purified (Qiagen), 
mixed with Pstl-Hnearized pMK20 and co-transformed 
into electrocompetent DY380 bacteria, bearing the 
lambda red recombination system for recombineering 
(51). After heat shock at 42°C for 15min, to induce the 
lambda system, bacteria were cultured on LB+Carb agar 
plates at 32°C overnight. Candidate clones containing re- 
combinant pMK20: pJL2 were screened by PGR and Pstl 
digestion, and verified by sequencing. Candidate and 
control plasmids were digested with Notl and Xhol 
at 37°G overnight. An 8.1-kb fragment containing the 
synthetic ORFl was Hgated to an 11.7-kb Notl-Xhol 
fragment from the pTN201 backbone. The final construct, 
pJL3 or Llspa::recoded-Ll-ORFl, was verified by Sanger 
sequencing. 

Various AS LI transcript overexpression plasmids were 
engineered from LI Tp template fragments generated 
by PGR using HiFi platinum Taq (Invitrogen), using 
primer pairs DES2880 x DES2881 (Llspa nucleotides 
2150-1286); DES2880 x DES2882 (nucleotides 2150-1636) 
and DES2879 xDES2881 (nucleotides 2823-1286). 
Resulting PGR products were digested with Notl and 
BamHl, electrophoresed on agarose gels, purified and 
ligated to linearized pGEP4 backbone. Fragments from 
AS synthetic LI elements similarly were generated using 
primers DES3818 x DES3820 (amplicon mapped to corres- 
ponding coordinates in Llspa, nt 2150-1121), DES3819 x 
DES3820 (nt 2150-1801), DES3818 x DES3821 (nt 2823- 
1121) and DES3819 x DES3821 (nt 2823-1801). Products 
were digested with Nhel and BamHl and ligated to simi- 
larly linearized pGEP4. 

LI retrotransposition assays 

HeLa cells were cultured in DMEM with 10% heat- 
inactivated fetal bovine serum and 2% penicillin/strepto- 
mycin (Gibco). Gells at ^75% confluence in six-well plates 
or T25 flasks were transfected with plasmid DNA mixed 
with FuGENE HD (Roche) at a ratio of 1 |ig to 3 |il 
FuGENE. To quantify transfection efficiency, GFP 
expression was assessed by fluorescence microscopy in 
cells transfected with pEGFP-Nl (Glontech). Both stable 
and transient transfection assays were performed. In the 
former (13,23), cells were treated with 0.2mg/ml 
Hygromycin for various periods, starting 3 days after 
transfection. Hygromycin-resistant (Hygro^) cells then 
were grown without selection for several days, prior to 
selection for Neo^ LI integrants in 0.4mg/ml G418 
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(Invitrogen) for 2 weeks. In the transient assay (52), Neo 
LI integrants were selected directly. After discrete colonies 
formed in either assay, cells were washed in 1 x phosphate 
buffered saline (PBS), fixed in 2% formaldehyde/0.2% 
glutaraldehyde in 1 x PBS, washed and stained using 

0. 4% Giemsa (Sigma) at RT overnight and then counted. 
To assess effects of AS LI transcripts overexpressed in 

trans, we co-transfected HeLa cells with smLl donor 
plasmid together with AS smLl fragment-expressing con- 
structs. One |ig of pCEP4/smLl/Neo donor plasmid DNA 
was mixed with 1 |ig of various smLl AS fragment- 
expressing constructs or empty vector pCEP4, respect- 
ively, in FuGENE 6. Two |ig of vector pCEP4 was used 
as another negative control. A transient assay (52) was 
performed to test impacts of the AS smLl fragments on 
retrotransposition, by plating cells at various dilutions and 
counting resulting NeoR colonies. A similar experiment 
was performed to assess inhibition of LI spa retrotran- 
sposition (from donor plasmid pTN201) on expression 
of AS LI transcripts (see Supplementary Information). 
In an independent assay of expression levels from various 

1. \-TEMl reporter constructs, we conducted qRT-PCR. 
Plasmids were transfected into HeLa cells, and total 
RNAs were isolated and treated with DNase (Ambion). 
First strand cDNAs were synthesized using an oligo-d(T) 
primer. As a control, samples were treated with no reverse 
transcriptase. LI expression was measured by SYBR green 
PGR master mix (Applied Biosystems) using ORF2 primers 
DES2784 X DES2790 for pTN201/TEMl, pJL3/TEMl and 
pMK28/TEMl; and DES 1 847 x DES 1848 for pCEP4/ 
smLl/TEMl. As a control for transfection, Hygromycin 
gene expression was measured by qRT-PCR using 
DES1249xDES1250. 



Role of Dicer in regulating LI retrotransposition 

To assay LI retrotransposition in Dicer ex5 — /— HCT116 
human colorectal cells, which are constitutively Neo^ (53), 
we used LI donors marked by T^'M- 7 -artificial intron 
(AI) (23). The r£'M7 -artificial intron (AI) reporter 
cassette and a portion of pCEP4 backbone were excised 
from pDES46 (which contains human LI. 3) by digestion 
with Notl and BstZ17\. The resulting ~13-kb backbone 
fragment was gel-purified. Native, hybrid or fully syn- 
thetic mouse LI constructs in pTN201, pJL3, pMK28 
and pCEP4/smLl were digested by BamHl, ends were 
filled in by Klenow and digested by Notl. Each of the 
resulting ~6.5-kb LI fragments was Hgated with the 
Notl - BstZ17\ fragment. Positive candidates were con- 
firmed by conventional Sanger sequencing. Resulting LI 
donor plasmids, marked by the TEMl-Al reporter, 
included pTN20 1 /TEM 1 , pJL3/TEMl, pMK28/TEMl 
and pCEP4/smLl/TEMl. 

To compare retrotransposition with native or hybrid 
LI donors marked with TEMl-Al reporter in HCT116 
wild-type versus Dicer — /— cell lines, transfectants were 
selected for lOd on 400mcg/ml hygromycin. Expression 
levels of spliced TEMl transcripts were assayed by 
qRT-PCR using DES3062 x DES3063. 



RESULTS 

Mapping an active AS promoter in mouse LI ORFl 

Previously, we and others identified mouse AS LI RIFTs 
(4,45). Based on their approximate 5^ ends and widespread 
expression, we hypothesized that an active initiating AS 
promoter could reside in an AS orientation within ORFl 
of mouse LI. To characterize this putative promoter ex- 
perimentally, we engineered 36 candidate promoter frag- 
ments directionally upstream of a TEMl (3-lactamase 
reporter gene otherwise lacking a promoter (48). To 
assay promoter activities of these fragments, we trans- 
fected resulting constructs individually into cultured 
mouse or human cells (Figure 1 and Supplementary 
Figure SI). The candidates were derived from mouse LI 
subfamiHes Tp, Gp, A and Fill; fully synthetic synonym- 
ously recoded smLl (more recently called ORFeus) (47); 
and a novel synonymously recoded ORFl template 
that we generated with A/T content similar to native 
elements. As positive controls, a constitutively active 
SV40 promoter and arrays of sense strand LI 5^ UTR 
monomers from Tp and Gp elements were engineered 
upstream of TEMl. As a negative control, no fragment 
was inserted upstream. Promoter strength scores were 
assigned to each fragment, based on (3-lactamase reporter 
enzymatic activity expression, or TEMl transcript levels 
(Figure 1 and Supplementary Figure SI). 

The highest level of AS promoter activity was found in 
LlTp AS nucleotides 2823-2125, mapped as per LI spa co- 
ordinates (Figure lA). Various LI subfamily members dis- 
played distinct AS promoter activities, i.e. Tp (~40% of 
positive control, i.e. LI Tp 5^ UTR monomers in sense 
orientation) >> Gp-A (-10% of control) >F (-5% of 
control). For these functional promoter assays, we chose 
particular elements to represent the subfamilies, i.e. LI spa 
for LI Tp subfamily; LI Gp62 for the Gp subfamily and 
LlMd_A2 for the A subfamily (Supplementary Figure S2). 
Within ORFl, these individual surrogates were 99.8, 99.7 
and 99.9% identical to the consensus subfamily sequences, 
respectively (Supplementary Figure S2). Differences 
between the subfamily consensus sequences and the indi- 
vidual surrogates were predicted at 1944A>G and 
2261G>C (i.e., Ll_Tp>Llspa, coordinates of sense 
strand, LI Tp reference element nucleotide listed 
first); 1963C>T, 2687T>A, 2716T>C and 2857A>C 
(Ll_Gp>Ll Gp62); and 2857G>A (Ll_A>LlMd_A2). 

A qRT-PCR assay for reporter transcript expression 
confirmed that LI Tp AS promoter activity was robust, 
i.e. again, approximately half that of the LI 5^ UTR sense 
promoter (Figure 1). Low but detectable promoter 
activities were observed in older LI subfamilies including 
F, Fn and/or Fni (Supplementary Figure SI). By contrast, 
virtually no promoter activity was detected in various 
fragments derived from the sense (coding) orientation of 
ORFl, AS ORF2, LI y UTR, smLl or a novel recoded 
ORFl sequence which we designed to contain A/T content 
comparable with natural LI sequences (Figure 1 and 
Supplementary Figure SI) (47). 

We examined a potential basis for the broad range 
of AS promoter activities among different mouse LI 
subfamiHes. Although they are defined mainly by 
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Figure 2. RNA polymerase II transcribes AS LI fusion transcripts (A) Chromatin immunoprecipitation (ChIP) with anti-RNA polymerase II (left) 
and anti-RNA polymerase III (right) antibodies, followed by PCR amplification of target LI or SINE B2 genomic sequences as indicated (right), 
showed specific enrichment (pulldown) of pol II at the AS LI promoter in mouse testis (asterisks, LI ORFl sequences). Coordinates from LI spa 
reference are shown (right, cf. Figure lA). RNA pol II also immunoprecipitated proximal LI sequences, i.e. templates for transcribed AS fusion 
transcripts. As a control, both pol II and pol III pulled down SINE B2 elements genome-wide (bottom) as expected (54). (B) Mouse spermatocytes 
were treated with a-amanitin (RNA polll inhibitor) as indicated (top). Total RNAs were isolated, and reverse transcriptase was added as indicated 
(+ or — ; top) before PCR amplification of various cDNAs as indicated (right). As a negative control, U6 transcripts (RNA pol III, not inhibited by 
a-amanitin) were amplified (bottom). 



differences between 5^ UTR sequences, their se- 
quences within ORFl also are distinct (Supplementary 
Figure S2). Comparison of representative LI subfamily 
amino acid sequences encoded by ORFl indicated that 
the particular portions comprising the AS promoter 
were more conserved, but still distinct, between 
subfamiHes, compared with the flanking, proximal and 
distal portions of ORFl (Supplementary Figure S2). By 
contrast, the LI subfamily sequences within ORF2, which 
do not contain this AS promoter, were nearly identical 
(not shown). These results suggested that ORF2 and the 
AS promoter segment within ORFl may have undergone 
strong purifying selection (Supplementary Figure S2). 
A recent analysis of the evolution of mouse and human 
Lis confirmed that the mouse ORFl coiled-coil domain 
has undergone much less adaptive change than that of 
human elements (15). 

RNA polymerase II transcribes AS LI fusion transcripts 

To confirm localization of AS promoter activity to mouse 
LI ORFl sequences and to define the RNA polymerase 
responsible for transcriptional initiation from it, we 
immunoprecipitated both RNA polymerases (pol) II and 
III, either of which plausibly could bind to and initiate 
fusion transcription from various endogenous TE se- 
quences. As shown in Figure 2 A, RNA pol II localized 
specifically to the ORFl fragment that contains AS 
promoter activity, i.e. nucleotides 2125-2823 (Figure 1). 



Notably, ChlP-PCR also demonstrated that RNA pol II 
bound to ORFl nucleotides 1528-2061, mapping to LI 
template sequences, downstream of the AS promoter, 
that are expressed as AS LI fusion transcripts. As a 
control, ChlP-PCR analysis of SINE B2 sequences con- 
firmed that both RNA pol II and RNA pol III bound to 
those sequences (54). 

To confirm the role for RNA pol II in transcribing AS 
LI RIFTs, we treated a mouse spermatocyte cell line, 
CRL2196, with a-amanitin (Figure 2B), a potent and 
specific RNA pol II inhibitor. We assayed AS LI RIFT 
expression by qRT-PCR, demonstrating substantial inhib- 
ition by this drug both in general and at individual loci. 
Together with ChlP-PCR, our results indicated that AS 
LI RIFTs were transcribed by RNA pol II. 

Identification of diverse AS LI RIFTs 

To find mouse transcripts that included sequences from 
genomic LI templates, we screened full-length transcripts 
represented in bacteriophage libraries. Although 6-30% 
of all mouse and human transcripts recently were 
reported to be initiated from TEs including Lis (17), we 
observed that only ~0.1 and 0.03% of all transcripts in 
phage cDNA libraries representing testis and thymus, re- 
spectively, hybridized with an LI Tp subfamily probe for 
ORF2 (hereafter called ORF2^ transcripts. Figure 1 and 
Supplementary Table SIB). Sequential hybridization with 
an LI Tp ORFl probe (Figure 1) revealed an additional 
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0.06% of testis transcripts and 0.02% of thymus tran- 
scripts, identifying those that contained 5^ LI ORFl but 
not ORF2 sequences. Of 940 testis cDNA clones 
hybridizing with either probe, 363 (~39%) were ORFl^. 
Similarly, of 253 thymus cDNA clones hybridizing with 
either probe, 99 (-39%) were ORF1+. 

We hypothesized that such 0RF1^0RF2' transcripts 
would include AS LI RIFTs. This possibility was 
prompted by our previous identification of fusion tran- 
scripts in adult mouse tissues, mapping to LI elements 
(4). Of 27 ORF1+ORF2- transcripts identified from 
testis, 21 (78%, Supplementary Table S2) contained AS 
LI ORFl sequences spHced with other exons in the sense 
orientation, forming AS LI RIFTs. Additionally, 2 of 13 
thymus cDNAs (15%, Supplementary Table S2) also were 
spliced AS LI RIFTs. Other ORF1+ORF2- cDNAs either 
were unspliced AS RIFTs, reading antiparallel to ORFl 
through the 5^ UTR into flanking genomic sequences (4 in 
testis, 15% of total; 2 in thymus, 15%), or were prema- 
turely polyadenylated, sense-strand transcripts (2 in testis, 
7%; 9 in thymus, 69%) (55). Some RIFTs were initiated in 
other mouse strains by polymorphic Lis absent from the 
C57BL/6 J (B6) reference genome (4,5). These screens also 
showed that some AS RIFTs were readily detectable 
without PCR ampHfication. 

We identified diverse spliced 0RF1^0RF2" transcripts 
initiated across the genome in a variety of chromosomal 
and tissue contexts, as illustrated by schematics of their 
genomic templates including the initiating LI elements 
(Supplementary Figure S3) (56). To determine whether 
AS LI RIFTs were expressed more broadly, we screened 
additional mouse strains and cell lines by qRT-PCR. We 
experimentally identified 41 additional AS LI RIFTs 
(Supplementary Table S2) expressed in cultured mouse 
spermatocyte cells or adult testes. Twelve (29%) aligned 
to genomic regions lacking a previously annotated gene, 
and two (5%) were initiated from polymorphic Lis 
absent from B6 mice (5). In addition, we searched public 
expressed sequence tag (EST) libraries by BLAST align- 
ments (45), revealing 15 additional full-length mouse testis 
ESTs (57) as spliced AS LI RIFTs (Supplementary Table 
S2). Fifty-seven EST clones contained AS LI sequences in 
their 5^ ends. Of these, 22 were spliced, but no splicing was 
observed within LI sequences per se. Many of the AS LI 
RIFTs identified by bioinformatics analysis were found in 
testis and embryonic cells at certain developmental stages, 
again suggesting a high level of tissue specificity. This 
search identified >80 EST clones with AS alignment 
>300nt and >90% identity with LI at their 5' ends, of 
which 15 were full-length RIKEN cDNAs. In some cases, 
y paired ends of other EST clones were identified from the 
EMBL/EBI database using 5^ clone IDs; 57 clones were 
sequenced from both ends. 

To compare RIFT expression levels in different tissues, 
we re-assayed 17 RIFTs identified initially in adult 
testis or from a spermatocyte cell line (Supplementary 
Figure S4). As expected, almost all of these RIFTs were 
confirmed in testis. Relatively few were expressed in other 
tissues assayed, but we did recover clones lASIIl, add- 
itionally expressed in 11-day embryos; L1-5AS1-1, add- 
itionally expressed in brain; and CRL2196C10, widely 



expressed in most tissues assayed. We also assayed for 
overlapping spliced transcripts from cognate genes. 
Although AS LI RIFTs that were spHced to downstream 
exons of Erbb2ip, Usp29 and Arhgapl5 each were ex- 
pressed in testis, the corresponding conventional tran- 
scripts of these genes (i.e. lacking sequences from Lis) 
were not detectable there. 

To identify genes whose expression levels may be 
affected by AS LI RIFTs, we probed Affymetrix mouse 
exon microarrays conventionally with total RNAs. As 
commercial exon microarrays typically exclude probes 
for repetitive elements such as LI retrotransposons, we 
also developed a novel, unconventional assay using the 
arrays to screen specifically for AS LI RIFTs that 
include downstream exons. In this assay, hereafter called 
the RIFT assay, we prepared cDNAs from several tissues 
and mouse Hneages by RT-PCR, using an AS LI -specific 
primer paired with an oligo-d(T) primer. At least 130 
unique spliced AS LI RIFTs were identified in adult 
testis (Supplementary Table S2), of which many were 
also identified in phage cDNA Ubraries (Supplementary 
Table S2). Thus, many transcripts were corroborated by 
independent methods. 

Both assays, i.e. the RIFT assay and conventional ex- 
pression profiling using exon microarrays, confirmed the 
expression of an AS LI RIFT at Arhgapl5, initially found 
by screening a testis cDNA library (4). The initiating LI 
integrant is polymorphic (5) and is oriented antiparallel to 
the transcription unit of ArhgaplS. The AS LI RIFT, ex- 
pressed in the same orientation as the overlapping gene's 
reading frame (Figure 3), was readily detectable in RNA 
from B6 but not the other strains tested, consistent with 
the presence or absence of the initiating LI element. Both 
assays showed that this AS RIFT contributed to overall 
ArhgaplS RNA levels, in particular those measured at its 
y end. 

The RIFT assay also showed that distinct AS LI 
RIFTs, although expressed in various tissues, were most 
abundantly expressed in testis. Several other RIFTs were 
identified in brain and kidney (Figure 4). Notably, a few 
RIFTs were expressed in more than one tissue. Thus most, 
but not all, RIFTs were expressed in a tissue-specific 
fashion. In addition, comparison of RIFTs expressed in 
five diverse strains highlighted that approximately half 
were conserved in all five strains (Figure 4), implying 
that potential biological functions of some RIFTs may 
be shared. Other RIFTs were expressed only in particular 
strains, consistent with the presence of the polymorphic 
LI AS promoter in about half of these cases and with 
differential RIFT expression in the others. 

Using targeted RT-PCR, we observed —40% of extant 
LI Tp subfamily members studied here initiated a nearby 
AS LI RIFT. About 13% of LI Gp elements, about 4% of 
A elements, and zero of one F element initiated RIFTs (4). 
Overall, about 19% of 68 genomic elements initiated 
RIFTs (data not shown). 

AS LI RIFT TSS are proximal to the AS LI promoter 

To identify the 5^ TSS of AS LI RIFTs, we performed 5^ 
rapid amplification of cDNA ends (5^ RACE) analysis on 
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Figure 3. Contribution of an AS LI RIFT to overall Arhgapl5 gene expression in various mouse strains. (A) Schematic representation of ArhgaplS 
exons, including a polymorphic AS LI integrant in the B6 reference genome but not in other strains. (B) AS LI RIFT expression at ArhgaplS was 
detected in B6 mice, using the novel RIFT assay where we performed RT-PCR using AS LI and ohgo-d(T) primers, followed by hybridization of 
resulting cDNA products to an Affymetrix mouse exon microarray. We required five consecutive exon probes to be strongly positive to call RIFTs. 
Shown are genomic positions of probes within exons (x-axis) and hybridization signal intensities on a log scale (y-axis). Legend, inset: five mouse 
strains, different symbol colors and shapes. (C) Conventional assay for ArhgaplS expression in total RNAs (see legend, B). The AS LI RIFT in B6 
mice affects total RNA expression levels at the y exons downstream of the polymorphic, initiating LI integrant (see corresponding positions. A). 





Figure 4. Comparison of AS LI RIFTs expressed in various mouse tissues and strains. Distinct AS LI RIFTs were counted in Venn diagrams 
depicting shared {overlapping) and unique (^distinct) RIFTs expressed in different mouse strains and tissues. Numbers indicate unique RIFTs in each 
group. (A) AS LI RIFTs expressed in B6 testis (n = 71, blue), brain (n = 9, red) and kidney (n = 8, green); (B) AS LI RIFTs expressed in testis of 
five mouse strains: 129S1 (n = 70, blue), 129X1 (n = 66, red), A/J (n = 63, green), B6 (n = 71, purple) and DBA/2 J (n = 62, orange). 
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Figure 5. AS transcription start sites found by 5' RACE in multiple tissues. (A) A 5^ RACE was performed by PCR for 5' ends of AS LI RIFTs, 
using total RNAs from testis, kidney and brain. Products were separated by agarose gel electrophoresis. Individual cloned 5' ends were sequenced 
from these pools. (B) The cumulative positions of TSS for AS Ll-gene RIFTs are plotted by summing individual transcripts' 5^ ends, mapped against 
coordinates from LI spa. We analyzed 19 5' RACE clones from testis (red), 35 from kidney (blue) and 54 from brain (green). Also superimposed here 
are the cumulative positions of 5' ends from 24 RIKEN clones that ahgn well with LI spa, although these formally are not ends determined by 5' 
RACE cloning (Supplementary Table S2) (45). 



fusion transcripts expressed in testis, kidney and brain. 
A primer specific for the LI Tp ORFl template was 
paired with a standard RACE primer for ampHfication 
from total RNAs. A range of PCR product sizes 
was observed, reveahng multiple nearby TSS (Figure 5). 
A large fraction of the 5^ ends of transcripts recovered 
from all three tissues mapped to ORFl nucleotides 
2201-2244. In kidney and brain, additional TSS mapped 
to a wider range of ORFl sequences, i.e. nucleotides 2210- 
2306 and nucleotides 2210-2478, respectively. These 
results correlated well with the 5^ ends of RIFT cDNAs 
identified in phage libraries (Supplementary Table S2). In 
addition, the 5' ends of 24 RIKEN cDNA clones, most 
of which were reported previously (45), mapped to this 
same region. Thus, the 5^ TSS of the fusion transcripts, 
determined experimentally by 5^ RACE analysis and from 
cDNA clones, were closely adjacent to the experimentally 
mapped AS LI promoter (Figure 1). 

We observed a candidate transcript-initiating TATAA 
sequence at position 2698 of AS LI Tp ORFl 
(Supplementary Figure S2), but it is likely too distant 
from the RIFTs' 5^ ends, identified by RACE (Figure 5), 
to account for them. Nevertheless, many mouse and 
human promoters lacking TATAA sequences have been 
identified previously, including variants of an 'initiator 
element (Inr)' sequence (58). We noted several variants 
of this sequence within the mapped AS promoter, some 
of which were immediately adjacent to observed TSS in 
the RIFTs. 

Other predicted features of AS LI RIFTs 

To determine whether there are canonical splice donor 
sites and predicted translation start sites in the LI tem- 
plates for AS RIFTs, first we mapped an arbitrary collec- 
tion of 65 spHced, fully sequenced AS LI RIFTs to the 
reference genome. The cDNAs were spliced mostly at one 



of two consensus donor sites. The most common donor, 
used in 44 (68%) of 65 RIFTs, was GATGgtgag (coord- 
inate 1838 of LI spa. Figure 1). Another common splice 
donor in 13 (20%) transcripts was TCAGgtgtg (LI spa 
coordinate 1892). Both of these donor sites included con- 
ventional spHcing sequences. Conceptual translation of 
fusion transcripts revealed that eight predicted translation 
start sites (ATG) occurred within the AS ORFl sequences 
of the RIFTs, in at least two of three possible reading 
frames in a variety of sequence contexts, suggesting that 
fusion proteins may be expressed from many diverse 
transcripts. 

Effect of genomic context on AS LI RIFT expression 

To assess whether variable position effects or gene-specific 
expression differences (16) could influence AS promoter 
activity differentially at distinct chromosomal loci, 
we asked whether comparable intronic AS promoters, 
located in various genomic locations but present within 
the same tissues, could have similar activities. We 
identified 13 (19%) of 68 polymorphic full-length Lis, 
which initiated AS LI RIFTs in testis, as assayed 
by RT-PCR (4). Thus although a significant fraction of 
distinct LI AS promoters initiated AS LI RIFTs, a 
majority did not, even in 'transcriptionally capable' 
tissues such as testis. This observation suggests that the 
genomic contexts (16) of comparable extant LI integrants 
can influence their expression of AS RIFTs. 

Impacts of AS transcription on LI transcription 
and retrotransposition 

The synthetic mouse LI element smLl retrotransposes 
~200-fold more than endogenous mouse Lis (47). 
Increased RNA polymerase II processivity and increased 
expression of LI ORFl and ORF2 were proposed to be 
causes of this increase (47). Compared with mouse smLl, 
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a synthetic human LI (ORFeus-Hs) retrotransposed only 
about 3-fold more than the most active native human LI 
elements (59). The exact basis for the differential increase 
in retrotransposition by synthetic mouse more than syn- 
thetic human Lis, over the corresponding native elements, 
is unknown. We noted that smLl lacked the AS pro- 
moter activity harbored in ORFl by native mouse Lis 
(Figure 1), thereby plausibly contributing to marked in- 
creases in its expression and retrotransposition. To test 
this possibility, we replaced native ORFl in LI spa with 
the synonymous fragment from smLl, forming a novel, 
hybrid full-length LI donor. To assess the role for A/T 
content in affecting LI transcript levels, we also 
synthesized a second partially recoded hybrid LI donor, 
i.e. as in pJL3. Like smLl, the recoded LI in pJL3 also 
lacked AS promoter activity (Figure 1), but it had higher 
A/T content, similar to that of native mouse LI elements. 

We also measured transcript levels expressed from these 
native or hybrid LI donor elements using qRT-PCR. The 
lowest LI transcript levels were observed for native LI Tp 
(LI spa), whereas the highest levels were seen for full- 
length smLl (Supplementary Figure S6). Intermediate 
levels were seen for the novel hybrid element containing 
recoded ORFl, harboring no AS promoter activity and 
neutral changes in A/T content, engineered upstream of 
native LI spa (Tp) ORF2. Somewhat higher expression 
was seen for the second hybrid LI element, i.e. smLl/ 
LI spa in pMK28, which has lower A/T content in 
ORFl (47). The results suggested a potential contribution 
by native AS LI promoters in reducing LI transcription. 

We also compared mobilization of the various engin- 
eered Lis (Figure 6) (13). The hybrid LI with reduced 
ORFl A/T content in pMK28 retrotransposed at least 
100-fold more than native LI spa (Figure 6). The partially 
recoded hybrid LI in pJL3, with neutral changes in ORFl 
A/T content, mobiUzed up to ~39-fold more than native 
LI spa. We conclude that synonymous disruption of the 
AS LI promoter in ORFl, regardless of its A/T content, 
can increase retrotransposition substantially. These results 
are also consistent with evidence showing that longer LI 
templates bearing reduced A/T content can result in 
increased transcript levels and retrotransposition (47). 
Thus, the AS LI promoter helps to limit retrotran- 
sposition in cis. 

To determine whether overexpressed AS LI transcripts 
could inhibit retrotransposition in trans, first we engin- 
eered AS smLl fragments to overexpress them in the 
desired orientation. Four AS fragments from smLl, cor- 
responding to AS LI spa coordinates 2119-1120, 2800- 
1120, 2119-1812 and 2800-1812, each were cloned down- 
stream of the CMV promoter and were co-transfected 
with marked smLl in a transient retrotransposition 
assay (52) (Figure 6 and Supplementary Figure S7A). As 
a positive control, where smLl could mobilize in the 
absence of overexpressed AS LI transcripts in trans, 
empty pCEP4 was co-transfected with smLl. A negative 
control consisted of cells transfected with no smLl donor 
and pCEP4 alone. The overexpression of AS smLl tran- 
scripts in trans significantly suppressed smLl retrotran- 
sposition, i.e. by ~50-75% (Figure 6). This level 
of repression was comparable with that of human 



LI siRNAs (34). In another experiment, several distinct 
native AS LlTp transcripts (generated from LI spa 
template at coordinates 2823-1286, 2150-1286 and 
2150-1636; cf. Figures 1 and 5, Supplementary Figure 
S3 and Supplementary Table S2) were overexpressed 
(Supplementary Figure S7). These AS LI transcripts 
overlapped in part with endogenous AS LI RIFTs 
(Figure 1). Their expression in trans suppressed LI 
retrotransposition at comparable levels, i.e. 2- to 5-fold 
(Supplementary Figure S7B). 

Modest role of Dicer in limiting native LI 
retrotransposition 

We hypothesized that AS transcripts initiated from the AS 
promoter, expressed together with sense transcripts 
initiated from the conventional 5^ promoter of mouse 
Lis, could result in the formation of double-stranded 
(ds) RNAs. In turn, these dsRNAs could trigger forma- 
tion of short interfering RNAs or micro RNAs through a 
Dicer-dependent pathway (60,61), thereby reducing sense 
strand LI transcripts and limiting retrotransposition. We 
tested this possibiHty by using Dicer knockout cells in a 
retrotransposition assay (23). Because Dicer ex5 -/- 
HCT116 human colorectal cells are Neo^ (53), we engm- 
eered novel LI donors, marked with the (3-lactamase 
TEMl reporter interrupted by an artificial intron (13). 
Either native or hybrid recoded Lis were transfected 
into HCT116 Dicer ex5 — /— cells and control wild-type 
Dicer cells. After selection on donor plasmids, retrotran- 
sposition was assayed by qRT-PCR analysis of spliced 
TEMl transcripts, expressed from new LI insertions 
(62). The retrotransposition rate of LI spa, which 
contains an active AS promoter, increased slightly, i.e. 
<2-fold, in Dicer—/— cells compared with control cells. 
By contrast, retrotransposition by recoded elements 
lacking AS promoter activity, i.e. pJL3/TEMl, 
pMK28/TEMl and pCEP4/smLl/TEMl, was essentially 
unchanged in Dicer—/— cells versus control cells 
(Supplementary Figure S8). Thus, Dicer played a modest 
role in suppressing native LI retrotransposition, mediated 
by AS LI transcription; most of the suppression by AS LI 
transcripts occurred independent of Dicer. Previous ex- 
periments showed a similar ~2-fold level of suppression 
of human LI retrotransposition on knockdown of Dicer 
in cultured cells. That result was interpreted as showing 
the role for Dicer-dependent RNA interference in 
regulating human retrotransposition (34). 

DISCUSSION 

A recent analysis of human and mouse transcriptomes sug- 
gested that 6-30% of all transcripts are initiated from re- 
petitive elements (17). Here, we have identified and 
experimentally characterized an active initiator of such 
transcripts, i.e. an AS promoter within ORFl of mouse 
LI retrotransposons, present in thousands of full-length 
copies genome-wide, more than its human counterpart 
(4,23). It initiated a diverse range of fusion transcripts, 
as shown by >100 distinct AS LI RIFTs identified here 
and elsewhere (Figure 3, Supplementary Figure S3 and 
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Figure 6. AS LI transcription helps to limit retrotransposition. (A) Cis effects. Native LI ORFl sequences in LI spa (black) were replaced either with 
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Supplementary Table S3) (4,45). AS LI RIFTs included 
spliced, unspliced and/or noncoding RNAs, and were 
readily detected in various mouse cell lines, tissues, devel- 
opmental stages and strains (Figure 4 and Supplementary 
Table S2) (4). In addition to adding significantly to tran- 
scriptional diversity, AS LI transcription helped to limit LI 
retrotransposition (Figure 6, Supplementary Figures. S6 
and S7). 

Characterization of an AS LI promoter and AS 
LI RIFTs 

The co-existence of a protein-coding sequence to- 
gether with an antiparallel promoter activity in opposite 
overlapping orientations is unusual, but is not unprece- 
dented, in mammahan genomes (63-65). 

Many sequence differences, particularly in the 5^ UTR 
and within ORFl, distinguished the three active mouse LI 
subfamiHes, i.e. Tp, Gp and A elements (Supplementary 
Figure S2). Several putative transcription factor binding 
sites in the AS promoter sequence of LI spa (50) and other 
Tp subfamily elements could be altered by natural 
sequence variants occurring in other LI subfamilies 
(Supplementary Figure S2). Although members of each 
subfamily retrotransposed recently (4,14,50,66), these 
sequence differences simultaneously could affect both 
their distinct retrotransposition rates, by affecting 
ORFlp structures, and their AS promoter activities. 
We note that a single amino acid substitution in mouse 
ORFlp can affect LI retrotransposition (67). In addition, 
the recoded synonymous sequences in ORFl of pMK28 
and pJL3 disrupted numerous predicted transcription 
factor binding sites in the AS promoter (Supplementary 
Figure S2), consistent with a complete lack of AS 
promoter activity observed in those elements (Figure 1). 

The various AS promoter activities associated with 
each LI subfamily (Figure 1) were roughly proportional 
to the number of RIFTs initiated by them in vivo 
(Supplementary Table S2 and Supplementary Figure S3). 
Thus, we concluded that AS LI promoter activities ranked 
as LI Tp >> Gp~A>F. Notably, the latter subfamilies 
possessed modest, but detectable, AS promoter activities. 

Estimated ages, counts and retrotransposition frequen- 
cies of LI subfamily members have varied considerably. 
The average ages of LI Tp elements range from 0.25 to 
1.23 milHon years old (15), and numbers of full-length 
insertions range from 3400 (4,68) to ~4800 (15), whereas 
active and/or polymorphic TF elements ranged from 
-1900 (4) to 3000 (68). The average ages of LI Gp sub- 
family members have been estimated at 0.75 to 2.16 
million years (15). Full-length Gp element counts have 
varied from 704 (4) to 1 500 (14). There are -400 (14) 
to 535 (4) active and/or polymorphic LI Gp elements. 



The average ages of the youngest LI A subfamily 
members have been estimated to range from 0.21 to 2.15 
million years, and older A subfamiHes have also been 
identified (15). Full-length A elements have ranged in 
number from 3400 (15) to 6500 (66). There are -900 
(14) to 1600 (4) active and/or polymorphic LI A inser- 
tions. Individual elements of all three subfamilies 
have been shown to retrotranspose at comparable 
frequencies (14). 

These findings prompted us to consider an apparent 
paradox. How might Tp subfamily elements harbor the 
strongest AS promoter activity, even though they have 
accumulated to some of the highest copy numbers of 
full-length LI integrants in the genome (4,15)? We specu- 
late that more robust host defenses might be necessitated 
by elements with increased retrotransposition potential, 
thereby resulting in relatively equivalent mobilization 
frequencies of distinct subfamily elements (14). This 
paradox could also be explained by comparing the long 
evolutionary times over which different subfamilies have 
accumulated, moving in germ line tissues under negative 
selection (15), versus the expression of AS LI RIFTs in 
germ Hne and somatic tissues, measured in real time. 

Although we detected both sense and AS LI transcripts 
expressed in the same tissues, including testis and thymus 
(cf. Supplementary Table S2), in this study we have not 
tested whether sense and AS LI promoters may be active 
simultaneously in single cells. If they are not, the resulting 
unbalanced expression of sense versus AS LI transcripts in 
distinct cells or tissues could allow particular LI elements 
to evade this putative defense mechanism. Moreover, 
individual mouse and human LI elements can mobilize 
over a wide range of frequencies, despite similar ORF se- 
quences shared by 'hot' versus 'cool' elements (14,69-71). 
Although we found many diverse AS LI RIFTs expressed, 
many were expressed at low levels, and many other poten- 
tially active, distinct LI elements had no detectable AS 
RIFT expression. 

We used several independent experimental methods 
to identify AS LI RIFTs (Figure 3 and Supplementary 
Table S2). These included screens of phage cDNA 
Hbraries, RT-PCR followed by cloning and sequencing, 
bioinformatics surveys of transcript sequence databases. 
Northern blots (4) and a novel RIFT assay using RT- 
PCR followed by exon microarray hybridization. 
Considered together with results from 5^ RACE analysis 
(Figure 5) and in vitro promoter assays (Figure 1), these 
findings clearly estabhshed that many diverse RIFTs were 
expressed from AS promoters located in LI ORFl in vivo. 

Many additional AS LI RIFTs might have been missed 
in our study, owing to a lack of saturation of our screens; 
a limited range of mouse tissues and Hneages used in the 
various screens; low expression levels; and/or strict criteria 



Figure 6. Continued 

downstream of its strong CMV promoter. Each cloned construct was co-transfected into HeLa cells with the smLl retrotransposition donor plasmid, 
pCEP4/smLl/Neo. As positive and negative controls, smLl donor alone and pCEP4 alone were transfected into HeLa cells, respectively. After 
transfection, cells were plated at various dilutions, selected on G418 for 2 weeks and Neo^ colonies were stained and counted (see Supplementary 
Figure S7). The mean and range of duplicate counts were determined, and retrotransposition frequencies were normalized relative to that of the 
smLl positive control (defined as 100%). Asterisks: significantly different from control retrotransposition frequency (two-tailed /-test, P<0.05 in ah 
pairwise comparisons). 
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imposed in our RIFT assay. Even so, after summing up all 
AS LI RIFTs observed by various methods, we conclude 
that the robust AS LI promoter activity characterized 
here still does not account for most of the 6-30% of all 
transcripts initiated from transposons in mouse (17). A 
possible explanation is that other, still unidentified, pro- 
moters inside or outside of TEs initiate such abundant 
transcripts. We are currently working to identify such po- 
tential promoters, but, to date, no experimental evidence 
for them has been reported. Alternatively, this reported 
range (17) could dramatically overestimate actual TE- 
initiated transcription. Our phage library screens 
revealed ~0.03 to 0.1% of all transcripts hybridized with 
an LI ORF2 probe (Figure 1), far less than identified from 
CAGE tags (17). In addition, recent studies in mouse em- 
bryonic stem cells identified most LI -specific small RNAs 
mapping to both strands of the LI 5^ UTR and proximal 
ORFl, but not ORF2 or the y UTR (43,44). 

The presence of a particular full-length LI element was 
necessary, but not sufficient, to initiate a locus-specific AS 
LI RIFT. We found that only 13 (19%) of 68 poly- 
morphic full-length Lis initiated AS LI RIFTs in testis, 
as assayed by RT-PCR. Moreover, some RIFTs only were 
expressed in embryonic, newborn or adult mouse testis, 
whereas smaller numbers were expressed in other organs 
such as brain and kidney (Figure 4). A few AS LI RIFTs 
were expressed in several tissues (Figure 4). We speculate 
that the determinants of variable initiation of RIFTs by 
various Lis across the genome may include position 
effects, neighboring transcription units, other nearby 
genomic features, tissue-specific factors and/or variable 
chromatin marks (16). Alternatively, certain LI integrants 
could undergo differential, transcriptional gene silencing 
in situ (72) (Kannan,M. et al., in preparation). 

Biological roles of AS LI RIFTs 

LI retrotransposons are actively mobilized in mouse 
and human germ lines, resulting in substantial, ongoing 
structural variation in both genomes (4,5,73,74). In 
addition. Lis may retrotranspose in somatic tissues such 
as the brain, during normal development, and in certain 
cancers, resulting in somatic mosaicism (75-78). Because 
AS promoters (including many polymorphisms) are inher- 
ently part of many such integrants, they could contrib- 
ute substantially to natural transcriptional variation 
distinguishing between Hneages, individuals and even 
cells (4,16). In addition to the robust level of AS LI 
RIFT expression at ArhgaplS (Figure 3), we previously 
reported comparably robust levels of AS LI RIFT and 
native transcripts at Drosha, as shown by northern blot 
(4). However, aside from these cases, most other mouse 
AS LI RIFTs appear to be expressed at low levels, as in 
human (27). Further experiments to quantify and compare 
RIFT expression levels versus long noncoding RNAs (79), 
microRNAs and other biologically significant transcripts 
are warranted. 

AS LI RIFTs frequently can be expressed from 
nonpolymorphic LI integrants in diverse mouse Hneages 
(Figure 4), implying that at least some may share a 
conserved, albeit unknown, biological function. Certain 



expressed RIFTs (Supplementary Table S2) may play 
several distinct biological roles including possible protein 
translation. In some cases, the predicted protein-coding 
ORE sequences of AS LI RIFTs match the cognate 
ORF in transcripts from the associated native genes, sug- 
gesting that although they may encode identical proteins, 
their expression patterns may be added to, or modified by, 
the AS LI promoter. Other AS LI RIFTs may modify or 
replace cognate protein structures or expression, generate 
novel proteins or long noncoding RNAs (25,26,46,80) or 
introduce different 5^ UTR sequences that could alter 
translational regulation. Transcripts that are AS to canon- 
ical sense transcripts could play other roles including 
degradation of sense strand transcripts through RNA 
interference or Dicer-independent mechanisms (42), 
variable compartmentalization and/or effects on tran- 
script splicing and termination, RNA editing and transla- 
tion (25,42,65). 

We also found that AS LI transcription also limited LI 
retrotransposition, as demonstrated both by altered LI 
transcript levels (Supplementary Figure S6) and mobiliza- 
tion on synonymous recoding of the AS LI promoter in 
ORFl in cis and upon overexpression of AS LI RIFTs in 
trans (Figure 6 and Supplementary Figure S7). Hybrid 
Lis, containing either a recoded synonymous ORFl 
segment from smLl with decreased A/T content (47) or 
a second recoded ORFl segment with neutral changes in 
A/T content, exhibited higher rates of retrotransposition 
than that of native LI spa (Figure 6). The native AS LI 
promoter could inhibit LI retrotransposition in cis by 
triggering transcriptional interference, i.e. convergent, bi- 
directional transcription (81). Expression of AS LI 
transcripts alternatively could result in formation of 
double-stranded (ds) RNA molecules that could affect 
chromatinization and silencing of the LI template (42) 
or trigger an interferon response (82). Such dsRNAs 
could form substrates for processing to small inhibitory 
RNAs through Dicer-dependent (60) or -independent 
mechanisms (42). Interestingly, a modest number of 
~23-nt small RNAs that map to the mouse LI 5^UTR 
region recently were identified in testis and in full-grown 
and meiosis I oocytes (83). In addition, both sense and AS 
small RNAs, mapping to the 5^ end of mouse LI elements, 
have been identified in mouse ES cells (43,44). Thus, both 
human and mouse LI retrotransposition can be inhibited 
by RNAi in various cellular contexts (34,43,44). 

We showed that Dicer played a modest <2-fold role 
in suppression of endogenous mouse LI elements 
(Supplementary Figure S8), the only elements capable 
of triggering dsRNAs that were tested here. By contrast. 
Dicer appeared to be a crucial component in RNAi- 
mediated regulation of LI expression and mobilization 
in mouse ES cells (43,44). We found that retrotrans- 
position of pJL3/TEMl was higher than that of 
pTN20 1 /TEM 1 , even without Dicer (Supplementary 
Figure S8). For this reason, we speculate that the RNAi 
pathway is not likely to be the predominant suppressive 
mechanism of mouse LI elements, and that other suppres- 
sive mechanisms are involved, at least in the differentiated 
somatic cells tested here (Supplementary Figure S8). 
Thus, we conclude that AS LI transcripts act mostly 
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independent of Dicer in decreasing LI expression and 
retrotransposition. 

In summary, we conclude that mouse Lis encode a 
built-in mechanism that regulates themselves and alters 
expression of neighboring genes. We note a similar organ- 
ization of bidirectional promoters resides in most other 
classes of autonomous mammahan retrotransposons, 
including human Lis and mouse and human LTR retro- 
transposons (54,60,84-86). Interestingly, bidirectional 
transcription at a particular mouse SINE B2 element 
was found to help establish an insulator or boundary 
element that, in turn, is critical to the developmental regu- 
lation of a neighboring gene (54). The evolutionary impH- 
cations of such self-antagonizing promoters may be that 
transposons, including mouse LI retrotransposons, can 
thereby limit their own expression. This would reduce 
their deleterious effects and costs to the fitness of their 
host (87), while exaptively modifying and diversifying 
the structure, expression and control of many other 
genes (25,84). 
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